智能论文笔记

C2F-TCN: A Framework for Semi and Fully Supervised Temporal Action Segmentation

Dipika Singhania , Rahul Rahaman , Angela Yao

分类：计算机视觉

2022-12-20

Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence. For the task of temporal action segmentation, we propose an encoder-decoder-style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs. The C2F-TCN framework is enhanced with a novel model agnostic temporal feature augmentation strategy formed by the computationally inexpensive strategy of the stochastic max-pooling of segments. It produces more accurate and well-calibrated supervised results on three benchmark action segmentation datasets. We show that the architecture is flexible for both supervised and representation learning. In line with this, we present a novel unsupervised way to learn frame-wise representation from C2F-TCN. Our unsupervised learning approach hinges on the clustering capabilities of the input features and the formation of multi-resolution features from the decoder's implicit structure. Further, we provide the first semi-supervised temporal action segmentation results by merging representation learning with conventional supervised learning. Our semi-supervised learning scheme, called ``Iterative-Contrastive-Classify (ICC)'', progressively improves in performance with more labeled data. The ICC semi-supervised learning in C2F-TCN, with 40% labeled videos, performs similar to fully supervised counterparts.

translated by 谷歌翻译

在时间动作细分中，时间戳监督只需要每个视频序列的少数标记帧。对于未标记的框架，以前的作品依靠分配硬标签，并且在微妙的违反注释假设的情况下，性能迅速崩溃。我们提出了一种基于新型的期望最大化方法（EM）方法，该方法利用了未标记框架的标签不确定性，并且足够强大以适应可能的注释误差。有了准确的时间戳注释，我们提出的方法会产生SOTA结果，甚至超过了几个指标和数据集中完全监督的设置。当应用于缺少动作段的时间戳注释时，我们的方法呈现出稳定的性能。为了进一步测试我们的配方稳健性，我们介绍了Skip-Tag监督的新挑战性注释设置。此设置会放松约束，并需要对视频中任何固定数量的随机帧进行注释，从而使其比时间戳监督更灵活，同时保持竞争力。

translated by 谷歌翻译

时间动作分割对（长）视频序列中的每个帧的动作进行分类。由于框架明智标签的高成本，我们提出了第一种用于时间动作分割的半监督方法。我们对无监督的代表学习铰接，对于时间动作分割，造成独特的挑战。未经目针视频中的操作长度变化，并且具有未知的标签和开始/结束时间。跨视频的行动订购也可能有所不同。我们提出了一种新颖的方式，通过聚类输入特征来学习来自时间卷积网络（TCN）的帧智表示，其中包含增加的时间接近条件和多分辨率相似性。通过与传统的监督学习合并表示学习，我们开发了一个“迭代 - 对比 - 分类（ICC）”半监督学习计划。通过更多标记的数据，ICC逐步提高性能; ICC半监督学习，具有40％标记的视频，执行类似于完全监督的对应物。我们的ICC分别通过{+1.8，+ 5.6，+2.5}％的{+1.8，+ 5.6，+2.5}％分别改善了100％标记的视频。

translated by 谷歌翻译

本文介绍了预测关系提取的文本文档的覆盖范围的新任务（重新）：该文件是否包含给定实体的许多关系元组？覆盖预测可用于选择具有大型输入基层的知识库建设的最佳文档。为研究这个问题，我们为520个实体提供了31,366个不同文件的数据集。我们分析了文档覆盖的相关性与长度，实体提及频率，alexa等级，语言复杂性和信息检索分数的特征相关。这些特征中的每一个都只有适度的预测力量。我们采用方法将具有统计模型的功能相结合，如TF-IDF和BERT语言模型。该模型结合特性和BERT，HERB，实现了F1得分高达46％。我们展示了两种用例的覆盖预测的效用：KB建设和索赔驳斥。

translated by 谷歌翻译